Anonymizing classification data using rough set theory

نویسندگان

  • Mingquan Ye
  • Xindong Wu
  • Xuegang Hu
  • Donghui Hu
چکیده

Identity disclosure is one of the most serious privacy concerns in many data mining applications. A wellknown privacy model for protecting identity disclosure is k-anonymity. The main goal of anonymizing classification data is to protect individual privacy while maintaining the utility of the data in building classification models. In this paper, we present an approach based on rough sets for measuring the data quality and guiding the process of anonymization operations. First, we make use of the attribute reduction theory of rough sets and introduce the conditional entropy to measure the classification data quality of anonymized datasets. Then, we extend conditional entropy under single-level granulation to hierarchical conditional entropy under multi-level granulation, and study its properties by dynamically coarsening and refining attribute values. Guided by these properties, we develop an efficient search metric and present a novel algorithm for achieving k-anonymity, Hierarchical Conditional Entropy-based Top-Down Refinement (HCE-TDR), which combines rough set theory and attribute value taxonomies. Theoretical analysis and experiments on real world datasets show that our algorithm is efficient and improves data utility. 2013 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rough Set Theory In Data Mining Ppt

Rough set theory provides a useful mathematical concept to draw tends to serve well for data mining applications whereas the predictive model. The rough set toolkit for analysis of data (ROSETTA), which is an advanced machine learning algorithms for data mining tasks implemented in Java (33). Therefore, this paper presents the RoughSets package that allows researchers. Zdzislaw Pawlak, Rough Se...

متن کامل

Topological structure on generalized approximation space related to n-arry relation

Classical structure of rough set theory was first formulated by Z. Pawlak in [6]. The foundation of its object classification is an equivalence binary relation and equivalence classes. The upper and lower approximation operations are two core notions in rough set theory. They can also be seenas a closure operator and an interior operator of the topology induced by an equivalence relation on a u...

متن کامل

Rough sets theory in site selection decision making for water reservoirs

Rough Sets theory is a mathematical approach for analysis of a vague description of objects presented by a well-known mathematician, Pawlak (1982, 1991). This paper explores the use of Rough Sets theory in site location investigation of buried concrete water reservoirs. Making an appropriate decision in site location can always avoid unnecessary expensive costs which is very important in constr...

متن کامل

A New Approach for Knowledge Based Systems Reduction using Rough Sets Theory (RESEARCH NOTE)

Problem of knowledge analysis for decision support system is the most difficult task of information systems. This paper presents a new approach based on notions of mathematical theory of Rough Sets to solve this problem. Using these concepts a systematic approach has been developed to reduce the size of decision database and extract reduced rules set from vague and uncertain data. The method ha...

متن کامل

Reduction of DEA-Performance Factors Using Rough Set Theory: An Application of Companies in the Iranian Stock Exchange

he financial management field has witnessed significant developments in recent years to help decision makers, managers and investors, to made optimal decisions. In this regard, the institutions investment strategies and their evaluation methods continuously change with the rapid transfer of information and access to the fi- nancial data. When information is available ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 43  شماره 

صفحات  -

تاریخ انتشار 2013